Genetic Variants Associated with Chronic Obstructive Pulmonary Disease Risk: Cumulative Epidemiological Evidence from Meta-Analyses and Genome-Wide Association Studies

Background Last two decades, many association studies on genetic variants and chronic obstructive pulmonary disease (COPD) risk have been published. But results from different studies are inconsistent. Therefore, we performed this article to systematically evaluate results from previous meta-analyses and genome-wide association studies (GWASs). Material and Methods. Firstly, we retrieved meta-analyses in PubMed, Embase, and China National Knowledge Infrastructure and GWASs in PubMed and GWAS catalog on or before April 7th, 2022. Then, data were extracted and screened. Finally, two main methods—Venice criteria and false-positive report probability test—were used to evaluate significant associations. Results As a result, eighty-eight meta-analyses and 5 GWASs were deemed eligible for inclusion. Fifty variants in 26 genes obtained from meta-analyses were significantly associated with COPD risk. Cumulative epidemiological evidence of an association was graded as strong for 10 variants in 8 genes (GSTM1, CHRNA, ADAM33, SP-D, TNF-α, VDBP, HMOX1, and HHIP), moderate for 6 variants in 5 genes (PI, GSTM1, ADAM33, TNF-α, and VDBP), and weak for 40 variants in 23 genes. Five variants in 4 genes showed convincing evidence of no association with COPD risk in meta-analyses. Additionally, 29 SNPs identified in GWASs were proved to be noteworthy based on the FPRP test. Conclusion In summary, more than half (52.38%) of genetic variants reported in previous meta-analyses showed no association with COPD risk. However, 13 variants in 9 genes had moderate to strong evidence for an association. This article can serve as a useful reference for further studies.


Background
Chronic obstructive pulmonary disease (COPD) is a chronic inflammatory disease with progressive limitation of airflow [1]. According to the latest WHO prediction that COPD will become the third leading cause of death worldwide by 2030, COPD remains a huge health threat and high burden on health care resources to human beings [2]. Smoking and air pollution are widely believed to be the main risk factors of COPD, but only 15-20% of smokers develop this pathology and many COPD cases cannot be explained by environmental risk factors alone [3,4], suggesting that other factors such as genetic variations may contribute to the development of COPD as well [5]. Moreover, different clinical presentation and severity of COPD between different races and the familial clustering pattern can be seen in COPD cases, indicating substantial evidence of genetic variations to COPD [6,7]. As early as 2009, Smolonska et al. [8] conducted meta-analyses about 20 polymorphisms in 12 genes and demonstrated that three polymorphisms in TGF-β1 (rs2241712, rs1982073, and rs6957) were significant with COPD risk in the "diverse populations." Another three polymorphisms were reported significantly only in Asians (IL1RN rs2234663, TNF-α rs1800629, and GSTP1 rs1695). One year later, Castaldi et al. [9] added three more loci (GSTM1 null variant, TGF-β1 rs1800470, and SOD3 rs1799896) related to COPD risk. Follow-up studies reported some other polymorphisms that mainly focused on IL, MMP, ADRB2, CHRNA, ADAM33, VDBP, SP-A/B/D, and COX2.
Last two decades, a surge of studies investigating genetic polymorphisms and COPD risk have been published, including genome-wide association studies (GWASs), candidate-gene association studies, and meta-analyses. Covering the whole genome, GWASs detect millions of single nucleotide polymorphisms (SNPs) and are a powerful and efficient tool for identifying the association between genetic variants and complex diseases [10]. On the contrary, most of the candidate-gene association studies are underpowered to detect moderate-sized genetic effects. In addition, systematically integrating data from individual studies, meta-analyses have an advantage of developing a single conclusion with greater statistical power [11]. Despite the prominence and growing number of results from GWASs, candidategene association studies are still one of the main methods to identify common COPD susceptibility alleles. Some of these associations reported in candidate-gene association studies may be true associations; however, inconsistent results from different studies suggest the possibility of false-positive associations. Even results from meta-analyses cannot be replicated in follow-up studies, indicating that these associations lack robustness. Guidelines for assessing the cumulative epidemiological evidence of genetic associations, known as the Venice criteria, were proposed by the Human Genome Epidemiology Network (HuGENet) in 2008 [12]. ese guidelines help evaluate gene-disease associations [13]. However, as far as we know, there is still no such integrative and updated evaluation for all the reported polymorphisms associated with COPD in current literature. erefore, we collected evidence from meta-analyses and GWASs to conduct an integrative assessment of the gene-COPD associations based on the Venice criteria and falsepositive report probability (FPRP) test, thus providing a synopsis of our current understanding of the genetic basis of COPD risk.

Study Eligibility and Literature Search.
A systematic strategy was used to identify all relevant publications. Firstly, we searched PubMed, Embase, and China National Knowledge Infrastructure (CNKI) using the terms "COPD or chronic obstructive pulmonary disease," "genetic or genetic association or single nucleotide polymorphism or SNP or polymorphism or genotype or variant or mutation or susceptibility," and "meta-analysis or systematic review or literature review" for meta-analyses. We searched PubMed using the terms "COPD or chronic obstructive pulmonary disease" and "GWAS or genome-wide association study" as well as checking the GWAS catalog for GWASs on or before April 7th, 2022. Secondly, after manually screening the title and abstract, all references cited in relevant studies were also reviewed to identify additional studies. Inclusion criteria for meta-analyses were as follows: data were published in a peerreviewed journal; studies used a cross-sectional, case-control, or cohort design; the association was about the etiology of COPD; and necessary information was provided for the FPRP test or/and Venice criteria. GWASs were eligible for inclusion if they met the following criteria: the association was directly about the etiology of COPD; P value <5 × 10 −8 ; and both discovery and replication phases were reported. We removed duplicated and unrelated articles by screening the title and abstract or reading the full-text if necessary. Literature search and studies screening were done by Liu and Ran together. Any disagreement was resolved by consensus.

Data
Extraction. Two reviewers, Liu and Ran, extracted data separately and then exchanged and cross-checked. Any disagreement was resolved by consensus. We collected the following data from meta-analyses: PMID, gene name, genetic variant, first author, publication year, comparison model, ethnicity, OR and 95% CI, adjustment for smoking (Yes/No), I 2 , the number of studies, cases and controls, minor allele frequency (MAF) in controls, P value of Egger's test and P value of the Q test, and the number of test alleles or genotypes. We collected the following data from GWASs: PMID, gene name, genetic variant, first author, publication year, risk allele, ethnicity, OR and 95% CI, risk allele frequency, P value, and the number of cases and controls. Caucasian and Asian were the two major ethnicities reported. We defined ethnicity as "diverse populations" if a combination of two or more ethnicities were reported. We extracted significant results of subgroup analysis stratified by ethnicity. Cigarette smoking is the major environmental risk factor for COPD and may influence the distribution of genetic polymorphisms [14]; therefore, results after adjusted for smoking were extracted as well. Except several variants (e.g., PIMZ and TGF-β1 rs2241718), all other were investigated exclusively by one meta-analysis, most of the variants were widely investigated. We extracted data from them all, but only one meta-analysis was chosen for further assessment considering publication year, comparison model, sample size, between-study heterogeneity, and study design altogether. Because there was a considerable overlap among these studies, to avoid the variant nomenclature confusion from different articles, we used the uniform identifiers ("rs" number) of variants in the dbSNP. For the variants without any "rs" number, we used the common nomenclature (e.g., GSTT1 null/present and SERPINA3 Ala9 r). Associations were considered statistically significant if the reported P value was less than 0.05 or if the 95% CI excluded 1.0. Most of the meta-analyses provided multiple results under different genetic models; in order to avoid selection bias, we used the allelic model as the unified model. When data under allelic model were not available, then other models were used. Statistical analyses were performed with Stata, version 12.0, using the original data provided in meta-analyses if there was no sufficient information for the Venice criteria.

Evaluation of Cumulative Evidence.
For statistically significant associations from meta-analyses, we applied the Venice criteria to grade the credibility of cumulative epidemiological evidence. ree aspects were assigned to the Venice criteria: the amount of evidence, replication of association, and protection from bias. For each of the three aspects, three levels (A, B, or C) were assigned, based on which credibility was defined as strong, moderate, or weak.
Amount of evidence was determined by the sum of test alleles or genotypes among cases and controls: A (n > 1000), B (100 < n < 1000), and C (n < 100). We did not apply this criterion for rare variants with frequency less than 1% since an A grade was unlikely to obtain [15]. e replication of association was determined depending on between-study heterogeneity: A (I 2 < 25%), B (25% < I 2 < 50%), and C (I 2 > 50%). Protection from bias should take various potential sources of bias into consideration. A grade of A was assigned when there was no demonstrable bias or the bias would unlikely invalidate the association. A grade of B was assigned when there was no sufficient information for identifying evidence although there was no obvious bias. A grade of C was assigned when the bias was evident or/and was likely to explain the presence of association. Furthermore, if any of the following situations occurred, a grade of C was assigned: the sensitivity analysis indicated that the significant results can be substantially changed; the magnitude of the association was low (e.g., OR < 1.15 and OR > 0.87 for risk effect and protective effect, respectively) unless the association had been replicated prospectively by GWASs or several studies with no evidence of publication bias; and obvious publication bias (Egger's test P value <0.05). Cumulative epidemiological evidence was categorized as strong if all grades were A and weak if any grade was C. All of the rest combinations were categorized as moderate. If the data were insufficient, then associations were not evaluated (n � 6).
An approach developed by Wacholder et al. [16] was also used to calculate the FPRP for all the significant associations. e FPRP was determined by three parameters: the observed P value, the statistical power, and the prior probability. We used the prior probability of 0.01 and preset the FPRP noteworthiness value at 0.2 in our study. FPRP values and statistical power were calculated by the excel spreadsheet offered by Wacholder et al. We considered the association noteworthy if the calculated FPRP value was less than 0.2, indicating a true association. FPRP < 0.05, 0.05 < FPRP < 0.2, and FPRP > 0.2 were considered strong, moderate, and weak evidence of true association, respectively. We upgraded cumulative epidemiological evidence from moderate to strong, from weak to moderate, if evidence based on the FPRP was strong. We downgraded cumulative epidemiological evidence from strong to moderate, from moderate to weak, if evidence based on the FPRP was weak [17].

Overall Characteristics.
A total of 88 meta-analyses met the eligibility criteria ( Figure 1), reporting 86 variants in 40 genes. One hundred and fourteen associations (66 significant associations and 48 nonsignificant associations) were addressed between these variants and the risk of COPD. Except 9 associations, results were obtained based on at least three original studies. e included meta-analyses had a mean of 1822 cases (range 120-10466) and 4848 controls (range 100-95336). Publication time ranged from 2004 to 2019, and most of them (n � 77, 92.8%) were published since 2010. More detailed information was presented in the tables.

Significant
Associations in Meta-Analyses. As mentioned above, 66 significant associations were identified, including 38 associations obtained from the main meta-analyses and 28 associations obtained from the subgroup meta-analyses . ese associations involved 50 variants in 26 genes.
In the main meta-analyses, 43 variants reported ORs higher than 1.0 with a mean value of 1.61 (range 1.14-3.26, median 1.43), while protective effect was found for the other 7 variants with a mean value of 0.70 (range 0.51-0.86, median 0.76). In the subgroup meta-analyses, 16 variants reported ORs higher than 1.0 with a mean value of 1.66 (range 1.14-2.64, median 1.53), while protective effect was found for the other 7 variants, with a mean value of 0.65 (range 0.46-0.79, median 0.63). e average number of included studies in the main meta-analyses was 9, while the median was 6 (range 1-38). e average number of total sample size was 7818, while the median number was 4168 (range 220-49520). Most of the meta-analyses contained diverse ethnicities, while only three reported one specific ethnicity, including one for Caucasians and two for Asians. e Venice criteria and FPRP test were used to evaluate the epidemiological credibility of the 66 significant associations (Table 1). Firstly, the Venice criteria was applied, as a result grade A was given to 36, 21, and 53 associations for amount of evidence, replication of association, and protection from bias, respectively. Grade B was given to 25, 14, and 2 associations for these three aspects, respectively. Grade C was given to 3, 30, and 5 associations for these three aspects, respectively. It is worth nothing that the amount of evidence was not applied for one variant, PISZ, because of its low frequency (0.12%). As a result, 10, 17, and 33 associations were categorized as strong, moderate, and weak, respectively. We did not evaluate the other 6 associations because of the insufficient information. Secondly, FPRP was applied to test the evidence of true association. As a result, 17, 7, and 42 associations were categorized as strong, moderate, and weak, respectively. Finally, we upgraded cumulative evidence from moderate to strong for GSTM1 null/present in Caucasians, CHRNA rs1051730 in non-Asians, ADAM33 rs612709, SP-D rs721917 in Asians, TNF-α rs1800629 in Asians, and HMOX1 L allele in Asians and from weak to moderate for GSTM1 null/present, GSTM1 null/present in Asians, and TNF-α rs1800629. We downgraded cumulative evidence from strong to moderate for PIMS, ADAM33 rs3918396, and ADAM33 rs3918396 after adjusted for smoking and from moderate to weak for IL-13 rs1800925 in Caucasians, ADAM33 rs2280091 and rs511898, CYP1A1 rs4646903, TNF-α rs1800630, VDBP 1S, COX2 rs20417, IREB2 rs2568494, and ADRB2 rs1042714 in Asians. Altogether, cumulative epidemiological evidence of 13 associations involved 10 variants in 8 genes were considered to be strong, including GSTM1 null/present in Caucasians, CHRNA rs16969968, rs8034191, and rs1051730, CHRNA rs1051730 in non-Asians, ADAM33 rs612709, ADAM33 rs612709 in Caucasians, SP-D rs721917, SP-D rs721917 in Asians, TNF-α rs1800629 in Asians, VDBP 1F, HHIP rs13118928, and HMOX1 L allele in Asians. Eight associations involved 6 variants in 5 genes (PI, GSTM1, ADAM33, TNF-α, and VDBP) and 39 associations involved 40 variants in 23 genes were considered to be moderate and weak evidence, respectively.

Significant Associations in GWASs.
irty-two SNPs were reported in GWASs (Table 2), among which 25 SNPs showed an association with increased COPD risk with a mean OR value of 1.17 (range 1.08-1.36, median 1.15), while protective effect was found for 4 SNPs with a mean OR value of 0.73 (range 0.73-0.74, median 0.73).
ere were no sufficient data in GWASs for the rest 3 SNPs. Unsurprisingly, based on the FPRP test, all the 29 SNPs were proved to be noteworthy. Given that the Venice criteria was not applicable to GWASs, we did not further evaluate these results. Additionally, we extracted data of another 77 SNPs from the two GWASs excluded for lacking of replication phases. Because the study of Sakornsakolpat et al. [21] is one of the largest COPD GWASs to date (Supplementary Table 1), these data might provide an overview of known and wellestablished COPD SNPs from GWASs.

Discussion
As far as we know, this study is the only one which aims to give a comprehensive assessment on all the reported genetic variants and COPD susceptibility with systematic methods. We retrieved relevant meta-analyses and GWASs, extracting useful data for further evaluation. e Venice criteria and FPRP test were the two major tools. As a result, 10 variants in 8 genes were graded as strong evidence of association with COPD risk, including: GSTM1 null/present, CHRNA rs16969968, rs8034191, and rs1051730, ADAM33 rs612709, SP-D rs721917, TNF-α rs1800629, VDBP 1F, HMOX1 L allele, and HHIP rs13118928. Six variants in 5 genes and 40 variants in 23 genes were graded as moderate and weak evidence, respectively. In addition, 29 SNPs identified in GWASs were proved to be noteworthy. ere were overlaps between the two groups of SNP. Four SNPs (CHRNA rs1051730 and rs8034191, SP-D rs721917, and FAM13A rs7671167) reported in GWASs were also investigated by meta-analyses. Except FAM13A rs7671167, all the other three SNPs were graded as strong evidence, suggesting that methods we used do have the ability to pick out potential SNPs as long as a high-quality meta-analysis was included. GSTM is a kind of the glutathione S-transferase (GST) cytoplasmic enzymes that metabolize various toxic substances [98]. GSTM1, located on chromosome 1p13.3, is highly expressed in the lung tissue. GSTM1 homozygous deletion leads to the absence of protein expression and production, preventing the detoxification. Previous studies have proven that the null genotype of GSTM1 was related to the increased risk of inflammatory lung diseases, thus taking an important part in COPD development [99]. Our study demonstrated that GSTM1 null genotype showed strong cumulative evidence for an association with COPD risk in Caucasians and moderate cumulative evidence was found in "diverse populations" and Asians. In addition, highly consistent positive results were reported in recent meta-analyses [8,9,32,78,79]. e association between GSTM1 null genotype and COPD risk was well established.
CHRNA3/5, located on chromosome 15q25, encodes the subunits of alpha-nicotinic acetylcholine receptor (nAChR). Gwilt et al. [100] reported that nAChR might play a role in modifying the inflammatory response to smoking. Moreover, variants in CHRNA3/5 were involved in altering mRNA levels of CHRNA5 in the lung tissue and influencing receptor response to nicotine agonists [101,102]. erefore, we can draw a conclusion that these variants may contribute to the development of COPD. Four variants (rs16969968, rs8034191, rs1051730, and rs6495309) in CHRNA3/5 were reported significantly in our study. Except rs6495309, all of them showed strong cumulative evidence for an association with COPD risk. Interestingly, two variants were also identified as susceptibility loci for COPD by GWASs [25]. However, these strong associations were limited to non-Asians because distribution of the minor allele varies extensively between different ethnicities. Hence, we recommend additional studies to confirm the association between these polymorphisms and COPD risk in Asians.
ADAM33, located on chromosome 20p13, is a member of the "a disintegrin and metalloprotease" (ADAM) family and is intricately related to airway hyper-responsiveness and obstruction [103]. Rs612709 in ADAM33 showed protective effect and was rated as strong evidence of association with COPD in both "diverse populations" (OR � 0.60, 95% CI � 0.52−0.68) and Caucasians (OR � 0.64, 95% CI � 0.53−0.77). Another meta-analysis with the similar publication year and data source yet yielded an opposing result (OR � 1.46, 95% CI � 1.14−1.87) [41]. A cognitive error in the minor allele may lead to this conflict (rs612709G>A).
SP-D, also known as SFTPD, is a kind of alveolar surfactant-associated protein (SP) and plays a prominent role in maintenance of lung function and immune regulation [104]. Our study revealed strong evidence for an association between the T allele of rs721917 and COPD risk in both "diverse populations" and Asians based on a total of 1111 and 934 subjects, respectively. Although sample size is relatively small, the high frequency (>40%) of the T allele makes it possible to achieve sufficient statistical power. However, no significant association was reported in Caucasians because data from Caucasians were scarce. Hence, further studies that focus on Caucasians are warranted.
TNF-α is a pivotal cytokine in inflammation of the lung [105]. Rs1800629 in TNF-α contains a G to A variation, carriers of the A allele are more likely to activate the TNF-α promoter region and lead to overexpression of TNF-α [106]. Highly consistent positive results of association between rs1800629 A allele and COPD risk in Asians but not in Caucasians were reported [8,46,[71][72][73][74][75]. Our study also graded rs1800629 as having strong evidence of association with COPD risk in Asians. Nevertheless, no correlation was found between rs1800629 and COPD risk after adjustment for smoking (OR � 1.13, 95% CI � 0.95−1.35). It seems that other factors may contribute to the risk of COPD in smokers.
Expanding studies with a sufficient number of smokers are needed.
Heme oxygenase plays an important role in resisting damage caused by oxidative stress. HMOX1, belongs to the heme oxygenase isoforms, was reported to have the ability of protecting against airway inflammation and emphysema [107,108]. e L allele (long GT repeat sequence) of HMOX1 was rated as strong evidence of association with 2.23-fold increased risk of COPD based on 923 subjects in Asians. e sample size was relatively small, and the frequency of the L allele was low (9.29%). Additionally, only one meta-analysis was retrieved in our study. Although we used the Venice criteria and FPRP test to evaluate the credibility of an association, a firm conclusion should not be drawn until studies with more subjects can be included in the meta-analysis.
HHIP gene, located on chromosome 4q31, encodes an inhibitory protein for sonic hedgehog. e hedgehog is known to be essential for branching morphogenesis in the developing lung [109]. Zhou et al. [110] demonstrated that HHIP expression at both mRNA and protein levels is reduced in COPD lung tissues.
ese studies showed that HHIP plays an important role in maintaining the normal lung function. HHIP rs13118928 was firstly reported by Pillai et al. [25] in GWASs but did not reach genome-wide significance levels. However, follow-up studies [98,111] including ours have proved the association.
VDBP, located on chromosome 4q12-q13, is also known as GC and participates in binding substantial quantities of 25hydroxyvitamin D and vitamin D [111]. Two mutations (C⟶T and G⟶A) of SNPs (rs4588 and rs7041) in VDBP result in 3 common isoforms (GC1S, GC1F, and GC2) and different protein products. We found strong and moderate evidence of association between GC1F and the risk of COPD in "diverse populations" and Asians, respectively. Among 8 meta-analyses we retrieved, only one meta-analysis reported a significant association with decreased risk of COPD in Caucasians [94]. e association between polymorphisms and COPD risk varied between ethnicities. Reasons may be summarized as follows: different genetic backgrounds, different degrees of environmental exposure, and small sample size results in poor statistical power.
Six variants in 5 genes and 40 variants in 23 genes showed moderate and weak evidence of association with COPD risk in our study, respectively. Among which, two variants (PIMS and ADAM33 rs3918396) and eight variants (IL-13 rs1800925, ADAM33 rs2280091, CYP1A1 rs4646903, TNF-α rs1800630, VDBP 1S, COX2 rs20417, IREB2 rs2568494, and ADRB2 rs1042714) were downgraded from strong to moderate and moderate to weak due to high FPRP values (>0.2), respectively. As we mentioned in "Materials and Methods," three parameters may influence the calculated FPRP values. We used the prior probability of 0.01 for FPRP calculations; however, the calculated FPRP values might be smaller if a softer prior probability was used (e.g., 0.05) [17]. erefore, we recommend further investigations on these variants since significant associations may have been excluded.
ere was one more variant worth noting, FAM13A rs7671167; this showed weak evidence of association with COPD susceptibility because of the strong heterogeneity between studies. But it was identified by GWASs [27]. Results from GWASs, however, are more statistically significant and convincing. e Venice criteria, undoubtedly, is a useful tool to evaluate the cumulative epidemiological evidence of associations between SNPs and diseases. But sometimes omissions might occur because "weak evidence" was identified as long as a "C" was assigned to any of the three aspects. Results would be more credible if a more high-quality metaanalysis was included or different weights were assigned to the three aspects included in the Venice criteria.
Forty-eight statistically nonsignificant associations were reported for 55 variants in 27 genes. Five variants (TGF-β1 rs1800469, MMP-1 rs1799750, SERPINE2 rs3795879, and CHRNA rs578776 and rs588765) showed no association with COPD risk in meta-analyses including a minimum of 2400 patients and 3000 controls.
e MAFs of these variants ranged from 20% to 50%. In other words, these meta-analyses provided greater than 88% power to detect an OR of 1.15 under different genetic models for these variants. In addition, no inconsistency was found among meta-analyses investigating these variants; therefore, we can safely draw a conclusion that these variants are not associated with the risk of COPD. Further studies are unlikely to yield significant results.
Limitations of our study must be addressed. Scopus and Web of Science were not retrieved. Non-English literature except literature in Chinese and gray literature were not included. Although a systematic literature search was done, and some articles were overlooked. Only one meta-analysis was chosen for further evaluation, which might introduce bias to some extent. e latest candidate-gene association studies might be missed. We extracted results from subgroup analyses to address the heterogeneity, but heterogeneity does exist. High heterogeneity influenced the evaluation directly.

Conclusions
In summary, combining Venice criteria and FPRP test, cumulative epidemiological evidence of significant associations between genetic variants and COPD risk was evaluated. As a result, 13 variants showed moderate to strong evidence. Our study can provide not only reliable evidence but also helpful clues for further investigations.

Data Availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.

Conflicts of Interest
e authors declared that no conflicts of interest exist.